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Abstract 

A Bayesian nonparametric method for unimodal densities on the real line is 
provided by considering a class of species sampling mixture models containing 
random densities that are unimodal and not necessarily symmetric. This class 
of densities generalize the model considered by Brunner (1992), in which the 
Dirichlet process is replaced by a more general class of species sampling models. 
A novel and explicit characterization of the posterior distribution via a finite 
mixture of two dependent S-paths is derived. This results in a closed-form and 
tractable Bayes estimator for any unimodal density in terms of a finite sum over 
two S-paths. To approximate this class of estimates, we propose a sequential 
importance sampling algorithm that exploits the idea of the accelerated path 
sampler, an efficient path-sampling Markov chain Monte Carlo method. Numer- 
ical simulations are given to demonstrate the practicality and the effectiveness 
of our methodology. 

A Bayesian nonparametric method for unimodal densities on the real line is pro- 
vided by considering a class of species sampling mixture models containing random 
densities that are unimodal and not necessarily symmetric. This class of densities 
generalize the model considered by Brunner (1992), in which the Dirichlet process 
is replaced by a more general class of species sampling models. A novel and explicit 
characterization of the posterior distribution via a finite mixture of two dependent 
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S-paths is derived. This results in a closed-form and tractable Bayes estimator for 
any unimodal density in terms of a finite sum over two S-paths. To approximate 
this class of estimates, we propose a sequential importance sampling algorithm that 
exploits the idea of the accelerated path sampler, an efficient path-sampling Markov 
chain Monte Carlo method. Numerical simulations are given to demonstrate the 
practicality and the effectiveness of our methodology. 

1 Introduction 

Statisical theory usually assumes that data come from a distribution that is symmet- 
ric and unimodal at zero, such as a normal distribution or a Student's t distribution. 
However, it is common in real- life applications that underlying distribution of re- 
sponse variable, even though unimodal, may not be symmetric about its mode which 
is different from zero. For more information, see Dharmadhikari and Joag-Dev (1988) 
and Bertin, Cuculescu and Theodorescu (1997). There is a vast amount of literature 
on nonparametric estimations of unimodal densities and the mode from a frequen- 
tist viewpoint including early works of Granander (1956), Parzen (1962), Cher- 
noff (1964), Robertson (1967), Venter (1967), Prakasa Rao (1969), Wegman (1969, 
1970a, 1970b, 1971), other further studies by Lye and Martin (1993), Bickel and 
Fan (1996), Wang (1996) and Birge (1997) and among others. Some recent methods 
are, for example, a recursive method in Cheng, Gasser, and Hall (1999), kernel-based 
methods in Hall and Huang (2001, 2002), and other parametric models in Fernandez 
and Steel (1998), Jones (2004) and Ferreira and Steel (2006). 

From a Bayesian viewpoint, Brunner (1992) gave a nonparametric solution to 
the problems by assuming a mixture representation same as that in wherein 
the mixing distribution G is a Dirichlet process (Ferguson (1973)) for a unimodal 
density with a general mode 6 on the real line TZ. The posterior distribution and the 
Bayes estimate of the unimodal density can be characterized in terms of random 
partitions (see, e.g, Lo (1984) and Lo, Brunner and Chan (1996) for these well- 
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established theoretical results on Dirichlet process mixture models). 

This paper is devoted to improving Brunner's results and developing an efficient 
numerical method for practical usage of the Bayes solutions. A class of unimodal 
densities with mode at 6 of interest is defined by 



f{t\G, e)= —[!{{) <t - e < X) - i{x <t - e <{))] G{dx), t e n, (i) 



where I{B) is the indicator of an event B and G is from the class of species sampling 
models developed in Pitman (1995, 1996), of which the Dirichlet process is a member. 
All the results follow are therefore applicable to Brunner's model as his model is 
a special case of (^. The validity of the mixture representation for all unimodal 
densities given in the right hand of (0) can be justified by noting equality between 
its integral when ^ = and the distribution function of any unimodal density with 
mode at zero given in Feller (1971, page 158). 

The posterior distribution of like Brunner's model, can also be characterized 
in terms of random partitions, as the models are special cases of the species sam- 
pling mixture model defined in Ishwaran and James (2003) which takes the same 
form as O with the kernel X'^ [I{0< t - 9 < X) - I{X <t-e <0)] replaced by 
any density function in t given 6 and X. In this work, by utilizing the special and 
nice features of the kernel in (see 0) and noticing irrelevancy of some infor- 
mation carried by a partition in characterizing the posterior distribution, we are 
able to refine the partition-based results to show that the unimodal densities pos- 
sess special structures related to two S-paths, where an S-path is a random vector 
defined in Brunner and Lo (1989) (see also Dykstra and Laud (1981). Generally 
speaking, there exists a tractable characterization of the posterior distribution via 
some combinatorial structures that are considerably less complex than partitions. 
Such a characterization is known to be the first explicit type that is based on two 
S-paths. Similar phenomena based on one single S-path could be found in Bayes es- 
timations of symmetric unimodal or decreasing densities by Brunner and Lo (1989), 
Brunner (1995) and Ho (2006b) and monotone hazard functions by Dykstra and 
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Laud (1981), Lo and Weng (1989) and Ho (2006a), as the space of S-paths is con- 
siderably smaller than that of partitions (Brunner and Lo 1989). Intuitively, this 
characterization depending on two S-paths can be explained by the fact that there 
are two (possibly different) non-increasing curves on each side of the mode in uni- 
modal densities, but not only one (identical on either sides) in symmetric unimodal 
densities of which can be characterized in terms of one S-path (Albert Y. Lo, private 
conversation). 

It is recognized that if one could efficiently sample the two S-paths in this con- 
text, this would lead to more parsimonious methods for inference. Motivated by 
the co-existence of and the resemblance in constructions of an SIS algorithm and 
a Gibbs sampler for sampling random partitions in many Bayesian mixture mod- 
els (Lo, Brunner and Chan (1996) and Ishwaran and James (2003)), we propose (in 
Section ^ a novel sequential importance sampling (SIS) method (Kong, Liu and 
Wong (1994) and Liu and Chen (1998)), dubbed sequential importance path (SIP) 
sampler, for sampling directly one single S-path in the aforementioned models under 
monotonicity constraints by borrowing the idea behind the success of an efficient 
Markov chain Monte Carlo (MCMC) method introduced in Ho (2002, 2006a, 2006b) 
that serves the same purpose. Then, a natural SIS scheme based on applications of 
the SIP sampler is introduced for sampling the unknown mode and the two S-paths 
in evaluating/approximating posterior quantities for models in 

1.1 Some backgrounds on species sampling models 

Pitman (1995, 1996) developed the class of species sampling models that corresponds 
to the set of all random probability measure of the form 



where < Wk < 1 are random weights such that Wk < 1, independently of 
Vk, which are i.i.d. random variables with some non-atomic distribution H over a 
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measurable Polish space, and Sy^i') is a Dirac probability measure degenerate at Vk- 
This class includes a large number of well-known random processes, for instance, 
the Dirichlet process and its two-parameter extension, called the two-parameter 
Poisson-Dirichlet process (Pitman and Yor (1997)), the class of finite-dimensional 
Dirichlet priors discussed in detail in Ishwaran and Zarepour (2002a, 2002b), and the 
homogeneous normalized random measures with independent increments discussed 
in Regazzini, Lijoi and Priinster (2003). 

Suppose X = {Xi, . . . ,Xi\i) is a random sample from @. The joint marginal 
distribution of X is determined by the prediction rule, Pr {Xi € ■} = H{-) and 

PT{Xk+ie-\Xi,...,Xk}=£o,kH{-)+Y,kk^x;{-), k = 2,...,N-l, (3) 

where H is non-atomic and £o,A; and ij^k are non-negative measurable functions 
of The above prediction rule conveys that given which 

correspond to A''^ unique values X^, . . . ,X^^ of respective numbers of duplicates 
ei, . . . , CATj. , then the next observation X^+i takes the same value as Xj with prob- 
ability ij^k, j = 1, • • • ,Xk; otherwise it takes a new value from H with probability 
£o,fc- As a consequence of the exchangeability of (Xi, . . . , X]\f), Pitman (1996) shows 
that the distribution of Xi, . . . ,Xn, denoted by fi{dX.), is uniquely characterized 
by the joint law of its unique values and an exchangeable partition probability func- 
tion (EPPF) 

7r(p) = x(ei, • • • ,e7v(p)) (4) 
induced by the unique values. That is, 

Nip) 

l^{dX)= Trip) n H{dXl), 

k=l 

where p = {Ci, . . . , C7v(p)} of the integers {1, . . . , N} is a partition of A^(p) cells 
induced by = {j : Xj = X^} and x is a unique symmetric function depending 
only upon e^, the number of elements in or the size of Ck, k = 1, . . . , A'^(p) (see 
Pitman (1996) and Ishwaran and James (2003, Section 2) for more information). 
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2 A posterior distribution via S-paths 

This section concerns Bayes estimation of a unimodal density on the hne TZ with a 
general mode 9, defined by the species sampling mixture model in (^. Suppose we 
observe i.i.d. observations T = (Ti, . . . , T/v) from and assume any prior TT{d9) 
for 9. Given T, denote V{dG\9,T) and 7'(d6'|T) as the posterior distribution of G 
given 9 and the posterior distribution of 9, respectively. The posterior distribution of 
the pair (G, 9) in can always be determined by the double expectation formula, 



where h is any nonnegative or integrable function and M is the space of probability 
measures over IZ. Let us first look at V{dG\9, T) and then discuss V{d9\'T) later on. 
Suppose 9 is given. We can always assume that 



(Ti - 0, . . . - = Z U Y = {ZN-n,ZN-n-l, . . . , ^l) U {¥^,¥2, . . .,Yn), (6) 



where ZM-n < -^iv-n-i < • • • < < and Q < Yi < Y2 < ■ ■ ■ < Yn- Denote 
the missing variables in by X = (Xi, . . . ^X]\f). It is worthy of note that once 
an observation is taken from the kernel can be well-simplified according to two 
mutually exclusive situations, that is, the likelihood of an observation in T is 



The distinctiveness of the kernel yields a similar simplification (see ()33p and ()34() ) 
in the posterior distribution of G given 9 in terms of partitions p of the inte- 
gers {!,... , A^} in (|32j) . readily available from Theorems 1 and 2 in Ishwaran and 
James (2003). This implies that the n resulting positive observations after sub- 
traction of 9, Yi,...,!^, can only "cluster" with one another but not any neg- 
ative observation or vice versa. Hence, it is eligible to "split" the partition p of 
the integers/observations into two non-overlapping partitions p^ and p^. Write 




given by 
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p = p+ U p~. Without loss of generality, assume that p"*" = {Ci, . . . , CAr(p+)} de- 
notes the partition of the n positive observations and p^ = {C7v(p+)+i, • • • , C'/vr(p)} 
of the remaining N — n negative observations. Define 




7r(p |p+) X 7r(p+). 



(9) 



These, together with the facts that the second line of Q resembles, while the other 
line is symmetrical to, the scaled mixture of uniform representation of a symmetric 
unimodal density with mode at zero due to Khintchine (1938) and Shepp (1962), 
yield a posterior distribution of G given 9, which is expressible in terms of two de- 
pendent S-paths, as a consequence of applications of Theorem 2.1 and Corollary 2.2 
in Ho (2006b). 

Let us fix some notation before stating the main results. Define an integer- 
valued vector S = {Sq, Si, . . . , Sn-i, Sn) (Dykstra and Laud (1981) and Brunner 
and Lo (1989)), referred to as an S-path (of n + 1 coordinates), which satisfies (i) 
5*0 = and 5„ = n; (ii) Sj < j, j = 1, . . . ,n — l; and (iii) Sj < Sj^i, j = 1, . . . , n — 1. 
A path S is said to correspond to one or many partitions p of the integers {1, . . . , n}, 
provided that (i) labels of the maximal elements of the A^(p) cells in p coincide with 
locations j at which Sj > Sj-i, and (ii) size of the cell for all A; = 1, ... , A^(p) 
with a maximal element j, j = 1, . . . ,n, is identical to Sj — Sj-i. Let Cs denote 
the collection of partitions that correspond to a given S. Then, the total number of 
partitions in Cs is given by (Brunner and Lo (1989)) 




(10) 



See Ho (2002) for more discussion of the relation between p and S. Following from 
the symmetric definition of x in Qi we have 



7r(p) = x(ei, . . • , eAr(p)) := x(A4i,„(S)) 



if P G Cs 



(11) 
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where, for any integer 1 < a < b < n, 

Ma,b{S) = {Sj - Sj-i : Sj > Sj-i,j = a,a + 1,... ,6}. 

Write as summing over all paths S of the same number of coordinates, and 

rio-is} and E{j*|s} as Il]=i:S,>s,-i ^^'^ E"=i:5,>s,_i conditioning on S, respec- 
tively. 

Theorem 2.1. Suppose 6 is given and T are N i.i.d. observations from That 
is, © holds. Then, the distribution of X given 9 and T can be summarized by a 
joint law of (V, U), (S^,S^)10,T defined as follows. 

(i) Given {9,T), two paths S+ = (0, Sf, . . . , S:^_^, n) and S~ = (0, 5f , . . . , 

N — n) of n + 1 and N — n + 1 coordinates, respectively, have a (discrete) joint 
distribution ^(S-,S+|^,T) oc 0+(S+,T) x (/>-(S-, S+, T), where 

0+(S+,T) = |Cs+|x(-Mi,n(S+)) n / Uj"- '~ '-' H{dU,) (12) 

and 

0,-(S-,S^T) = |Cs-| ^^^^-"f "^'-^^VV^''^^ n ri-V^-^'^-'-^HidV^ 

(13) 

with |Cs| and x(') defined in (jTUI) and ((TT|) . respectively. 

(ii) Given (S-,S+) and (^,T), there exist iV(S+) = ^^^^ 1(5"+ > S+_^) posi- 
tive and A^(S^) = Ylf=i^ ^^'^i ^ "^/-i) negative unique values on TZ among 
{Xi,..., Xn}, denoted by U = {Uj : 5+ > S^_^,j = 1, . . . , n} and V = {Vj : 

> Sj_^,j = 1, . . . , A^ — n}, respectively. They are distributed, conditionally 
independent of one another, as 

H+{dUj\S+,Y) oc l{Yj < Uj) U~^^^'^^-'-'^H{dUj), (14) 

and 

HT{dVj\S-, Z) a l{Vj < Zj) {-VjY^^i '^^-^^ H{dVj), (15) 

respectively. 
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Theorem 2.2. For any nonnegative or integrable function g, the law of G given 9 
and T is characterized by 



g{G)V{dG\9,T) 

M 



EE 

s+ s- 



1/ g{G)VidG\V,U,S-,S+,e,T)] 



7^iV(s+)+jv 



n H-{dV,\S-,Z) n H+{dUj\S+,Y) 
{iis-} {r\s+} 



7r(S-,S+[0,T),(16) 



where V, U, , S+, 6*, T) is equivalent in distribution to P(dG|X*, p, 6*, T) 

given in (|32p and vr(S^, S^|0, T) is defined in Theorem I2.1f i). 

The above characterization of the posterior distribution of G given 6 for models 
in that is in terms of two S-paths is less complex than (or as complex as only 
when n,N — n < 4) the partition-based characterization H32|) (see Remark 12.71 for 
discussion in detail). A proof of the above two theorems is given in the Appendix. 

Given any path and of n + 1 and N — n + 1 coordinates, respectively, 
define 

r/o(S^,S = — — — 7^3:vr> (17) 

for J = 1 Ti 

„+.S+sV - - SU},M,,r,^AS-), Sf - Sl_, + 1) 

' ' ~ x(-Mi,n(S+),A^i,^_„(S-)) '^^^^ 

if 5^ > 5'^-^, otherwise, and, for j = 1, . . . , — n, 

_ x(A4i,n(S+),A^i,^-.(S-)\{5- - Sr_^,S- - S-_, + 1) 



if S7 > Sr„ 



otherwise. 



Corollary 2.3. Theorems 12 . 1 1 and 12 . 21 implv that a Bayes estimate of the unimodal 
density is given by the posterior mean given 9 and T, 

E[/(t[G, 9)\9, T]=Y,Y. ^' T)7r(S-, S+|0, T) (20) 



10 



Ho 



where 



af{t\S-,S+,e,T) 



+ 



%(S+, S-) 4o(t) + r?+(S+, S-) 4/t|S+) 

N-n 

r?o(S+, S-) d, ,{t) + ^7(S+, S-) d^/t\S 



lit > 9) 

l{t < 9), (21) 



with = = for j = 1, 



and, for j = 1, . . . , 



max(t—9,Yj 



U-(^t-'U+^)H{dU) 



H{dU) 



0, 

N-n, 



jrnn^(t-B,Z,)^_yy(S7-S-_,+l)jj^^y^ 



!^^^{-V)-^'^-'i--^H{dV) 



0, 



^3 > "^i-i' 



otherwise, 



otherwise. 



The above Bayes estimate is a weighted average of the function aj(t|S^, S^, 9, T) 
with respect to 7r(S^, S^|^, T). When H is defined by H31|) and the prior compo- 
nents, ryo(S+, S^) (ij"o(*) ^'^'i S^) dg^it), vanish, the function aj becomes con- 
stant between different ordered observations, which is of the same form as Robert- 
son (1967)'s maximum hkehhood estimate of the unimodal density when the mode 
is known as 9. 

Dividing the right hand side of (p6|) by the joint distribution of (V, U, S^, S^) 
given 9 and T, given by the last fine in ()16() . yields the following analogue of 
Lemma 2.1 in Ho (2006a) or Corollary 2.4 in Ho (2006b) which states that given 
(S^,S+,0,T), p is uniformly distributed over all partitions that can be split into 
p+ and corresponding to the given paths S^" and S~, respectively. The above 
estimator for a unimodal density follows from the same argument as in Ho (2006b) 
to be always less variable than its counterpart in terms of partitions due to (jS21) as 
a result of Rao-Blackwellization. 
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Corollary 2.4. Consider models in ((T]). Suppose S+|0,T 7r+(S+|6',T) and 
S^|S^,0,T 7r^(S~|S^, ^, T). Then, there exists a conditional distribution 

7r(p|S-, S+, 0, T) = — , p = p+ U p-, p+ G Cs+ , p- G Cs- , 

where |Cs| is given by (fTU]) . 

Suppose 9 is not known. Theorems 12 . II and 12 . 21 yield the conditional density of T 
given 6 to be proportional to J2s+ Ss- 't'ti^^^ T) (pg {S^ , S+, T). A standard prior- 
posterior updating operation in which the prior distribution of 9 is Tr{d9) results in 
the following theorem. 

Theorem 2.5. Assume any prior TT{d9) for 9. Then, the posterior distribution of 9 
given N i.i.d. observations T from is characterized by 

Pr{9eB\T)= [ J^J^7r(S-,S+,d0|T), (22) 
for any Borel set B TZ, where 



s+ s- 



^fS- S+ H91T) - </',+ (S+,T)^-(S-,S+,T)vr(dg) 

MS ,S '^^|T)-^^^^^^^_^+(s+,T)0,(S-,S+,T).(<i^)' ^''^ 

with (/>+(S+,T) and (/>g (S", S+, T) defined in (HH) and (O, respectively 

Finally, the posterior distribution of the pair (G, 9) in can be determined 
from © based on Theorems 12 . 21 and 12 . 51 and . hence, the posterior expectation of any 
functional h of (G, 9) can be expressible in terms of a finite sum over two dependent 
S-paths. In particular, a Bayes estimate of the unknown unimodal density follows 
by letting h{G,9) = f{t\G,9) in © and applying CorollarvIO 

Theorem 2.6. Assume any prior Tr(d9) for 9. Then, the posterior mean of an 
unimodal density ^ given i.i.d. observations T is given by 

E[f{t\G,9)\T]= f J]j;a;(t|S-,S+,0,T)7r(S-,S+,d0|T), (24) 

where o/(t|S-, S+, 6*, T) and 7r(S", S+, d6'|T) are defined in Corollary O and CT . 
respectively. 
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Remark 2.7. As the total number of partitions of N integers, which is the Bell's ex- 
ponential number B]\f, is roughly equal to A^!, the complexity of the partition-based 
characterization (jHSJ, which relies on the total number of partitions p (that is, the 
number of summands in X^p), is roughly equal to n! x {N — n)l. This quantity is iden- 
tical to that of Brunner's model, but it has not been pointed out in Brunner (1992). 
Meanwhile, the complexity of the path-based characterization (|16j) . which is based 
on the number of summands in the double sum J2s+ X^s-' depends on A„ x Ajv_„ 
where A„ denotes the total number of S-paths of n + 1 coordinates. Hence, its com- 
plexity is less than that of H32|) except when both n and N — n are less than 4 because 
An < Bn for all integers n, with equality only when n < 4 (Brunner and Lo (1989) 
and Ho (2002)). Tabled reveals a ratio between the complexities of ()16() and (|32() to 
be as large as 0.02097 when A^ = 20. This upper bound on the ratio drops quickly 
when A^ increases; for example, the bound is given by 0.00013^ = 1.69 x 10~^ when 
N = 40. 



Table 1: Complexities between path-based and partition-based characteriza- 
tions, H16|) and (|32() . versus sample sizes n and 20 — n. 



n 


20 - n 


An X A20-n 


B-ri X B2Q-n 


Ratio in % 


10 


10 


282,105,616 


13,450,200,625 


2.097 


8 


12 


297,457,160 


17,444,291,580 


1.705 


6 


14 


353,026,080 


38,752,562,366 


0.911 


4 


16 


495,007,380 


157,202,132,205 


0.315 


2 


18 


1,432,916,100 


2,046,230,418,477 


0.070 





20 


6,564,120,420 


51,724,158,235,372 


0.013 



2.1 An illustration with the two-parameter Poisson-Dirichlet pro- 
cess 

This section illustrates results obtained so far by selecting an important example 
of the class of species sampling models ((21), namely, the two-parameter Poisson- 
Dirichlet process (Pitman and Yor (1997)). Write the random measure as VT){H; a, h) 
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to indicate that its shape probabihty is H and there are two shape parameters 
< a < 1 and b > —a. A Dirichlet process with shape measure OH, 6 > 0, cor- 
responds to VT){H;{),9). Selections of a = a and 5 = give a normaUzed stable 
law with index < a < 1, of which a simple exponential change of measure gives a 
normalized inverse-Gaussian process considered by Lijoi, Mena and Priinster (2005). 
Posterior analysis of models in wherein G is VT){H; a, b) follow from the previ- 
ous discussion with explicit simplifications including ^o,fc = {b + Nka)/{b + k) and 
ij,k = {ej - a)/ (6 + A;) in 0, 

^^^^"^^'^^ = 

in jnHni), 

x(A4i,n(S+), M,,N^n{S-)) n^Pib + (iV(S+) + i - l)a] U{r\s-} I^ii - «) 

in m, %(S+, S-) = [6+ (iV(S+) + N{S-))a]/{b + N), r?+(S+, S") = (5+ - Sl_, - 
a)/{b + N), and rjj {S+,S-) = {Sj - Sr_^ - a)/{b + N) in CZHISl), respectively. 
Last but not least, in Theorem 12.21 V{dG\V , U, S^, S+, 0, T) is equivalent to 

E E ^^v,(-)+fi- E §- E 

{i*|s+} {j*|s-} V {j-|s+} {j'|s-} / 

where Gamma{S^ — S^_^ — a), GJ '~ Gamma{Sj — SJ_^ — o), G* = 

S{j*is+} + GJ + G with G ~ Gamma(6+ (iV(S+) + iV(S-))a), and all 

variables are mutually independent of V* = VV{H; a,b+{N{S+) + N{S-))a). Note 
that the above new expression of l|13|) is a symmetric function, yet different from 
that in ((T^ . depending on {Sj — Sj_-y : Sj > Sj_^,j = 1, . . . , N — n} only. This 
allows a straightforward application of the SIP sampler (Algorithm 13. 1|) in drawing 
S~ given S+ when constructing an SIS method (Algorithm 13. 4|) for models in (Q) in 
the next section. 
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3 Sequential Importance Sampling Schemes 

This section introduces a SIS method (Kong, Liu and Wong (1994), Liu and Chen (1998) 
and Liu, Chen and Wong (1998)) for sampling the triplets {S~,S^,9) in evaluat- 
ing/approximating posterior quantities for models in like ()22() and ()24|1. which 
are expressible in terms of finite sums of two dependent S-paths. The SIS method is 
based on yet another novel SIS algorithm, called sequential importance path (SIP) 
sampler, for sampling one single path at a time. The SIP sampler is designed in 
accordance with choosing trial distributions that mimic the probability kernels for 
Markov transitions in the accelerated path (AP) sampler introduced in Ho (2002, 
2006a, 2006b) that serves the same purpose. 

Generally speaking, the SIP sampler or any other existing SIS method allows us 
to draw an S-path of n + 1 coordinates according to a probability distribution 

7r(S) cx cPiS) = |Cs|x(Xi,n(S)) n m(^^-^^-)(Q,), (25) 

{i*|s} 

where |Cs| is defined in (jlOj) . xi') is a symmetric function depending on only its 
arguments, similar to (|1H) . m^^^^^^^'^\Qj) is a finite real-valued function depending 
on Sj — Sj-i and Qj only, and Qi,. . . ,Qn is a decreasing/increasing sequence in 
TZ. An inefficient SIS method proposed by Ho (2002, Section 4) consists of n — 1 
recursive determinations of one coordinate of the path S at a time in an ascending 
order conditioning on all previously determined coordinates according to a trial 
distribution 

Pr(5r = Sr\Si = Si, . . . , Sr-l = Sr-l) := tr{Sr\si, Sr-l) OC </>(s* ,,J (26) 

for r = 1, . . . , n — 1, where s* = (0, si, . . . , s^-i, Sr,r + 1) is a path of r + 2 
coordinates. After step n — 1, a path s = (0, si,S2, . . . ,Sn-i,n) drawn with prob- 
ability tn-i(s) = nr=i ^'■(^'■1^1' ■ ■ ■ ' then be treated as a Monte Carlo 
sample from (|25|) after being properly weighted by an importance sampling weight 
0(s)/t„_i(s). However, it turns out that the above scheme is practically not efficient 
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in evaluating sums over S-paths. In general, this is directly related to the discrep- 
ancy between the trial distribution ti{-\-) in (|2H|) and the true conditional distribu- 
tion of Sr given Si, ... , Sr-i derivable from the target distribution '7r(S) (Liu and 
Chen (1998)). Noticing that the "transition" in (|26() is equivalent to determinations 
of the two increments, Sr — Sr^i and r + 1 — Sr, of the path at locations r and 
r + 1, respectively, our idea is to replace location r -|- 1 by some other latter location 
q, which parallels the idea of constructing the AP sampler adopted in Ho (2002, 
2006a, b) when improving on an inefficient Gibbs chain. Let /q = and /„ = n and 
denote {Ii, . . . , In-i} as a random permutation of the integers {l,2,...,n — 1}, such 
that Dr = {/q} U {/i, . . . , Ir} U {/„} consists of all determined coordinates of the 
S-path after step r of the SIP sampler, for r = l,...,n — 1, 

Algorithm 3.1 (Sequential importance path (SIP) sampler). An efficient 
SIS method for sampling an S-path of n -|- 1 coordinates from 7r(S) given in (|25|) . 
the SIP sampler, consists of recursive applications of the following SIS steps for 
r = 1, . . . , n — 1: 

A. Given Dr-i, let p = max{/j G Dr-i : Ij < Ir} and q = min{/j S Dr-i ■ 
Ij > Ir}- Determine Sj^ = k, for k = Sp, Sp + 1, . . . , mm{Ir, Sq), according to 
a distribution 

Kr{k\{Sh ■■ h e Dr-^l}) OC 0(8}^^;,), (27) 

where S}_, = {0,SI,..., Sl_-^, S}^, Sl^-^, S*_-^,n) is a path of n + 1 coor- 
dinates such that Sj^ = k and for i = 1, . . . , Ir — 1, Ir + 1, ■ ■ ■ ,n — l, S* = Si^ if 
i = Ih & Dr-i; otherwise, S* = S*_^ (see Remark llH.i-il for explicit expressions 
of Kr{k\{Sh '. h € Dr-i}) for different values of k). 

B. Compute Kr{k\{Sh ■ h G Dr-i}), equals 4>{S*j^ ^) multiplied by the appropriate 
constant of proportionality, for the chosen value k of Sj^. 
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After step n — 1, we obtain a random path S = (0, Si, S2, • • • , Sn~i,n) distributed 
as the trial distribution 

n-l 

Kn-l{S) = n Kr{Sj^\{Sh : h G D^-l}). (28) 

Hence, its importance samphng weight is given by Wn-i{S) = (/)(S)/k„_i(S). Given 
M i.i.d. draws, S(i), . . . , S(7v/) with respective importance sampling weights 
t(;„_i(S(i)), . . . , w„_i(S(7(/f)), from the SIP sampler based on different permutations 
{/i, . . . , /n-i} of the n — 1 integers, any sum over S-paths/expectation of any func- 
tional h{S) with respect to the probability distribution 7r(S), -q^ = /i(S)7r(S), 
can be approximated by 



Y. iilKS{i))Wn-l{S (i) ) 
E»=l^^n-l(S(i)) 



^ ^.^^^>.-n-..^^>. _ (29) 



Remark 3.2. We remark that there are two major differences between the SIP 
sampler and the inefficient SIS method which intuitively explain why the SIP sam- 
pler is more efficient. On one hand, the coordinates of the path are determined in 
a random order in the SIP sampler, but not in an ascending order or any other 
pre-determined order. This arrangement is desired and crucial, as it results in de- 
termination of an increment at a location possibly latter than r -|- 1 in step r, which 
is the idea behind the success of the AP sampler. On the other hand, each trial 
distribution K.r{Si^\{Sfi : h € Dj—i}) in the SIP sampler mimics the transition prob- 
abilities in the AP sampler, in the sense that it is proportional to the probability of 
a path of n -|- 1 coordinates for any r = 1, . . . ,n — 1, rather than the probability of 
a path of number of coordinates varying with r. 

Remark 3.3. In the SIP sampler (Algorithm 13. If) , the trial distribution Kr{k\{Sh '■ 
h S Dr-i}) is explicitly proportional to 

if A; = Sp, or 



Unimodal Density 



17 




Sq — Sp — 2\ -r-r f i — k \ 



if A; = + 1, . . . , mm(/r, Sq — 1). 

Algorithm 3.4. An SIS method that samples (S^, S+, 9) from (|23p consists of three 
major steps: 

(i) Sample 9 according to a density p{9) > 0, 9 £ TZ. Then, define Y and Z 
accordingly based on (jH)). Also, choose random permutations of{l,...,n — 1} 
and {1, . . . , - n - 1}. 

(ii) Given 9, determine S"*" by applying Algorithm 13. II with function 0(S) defined 
by (/)+(S+,T) in (O. Obtain «;„_i(S+|6l) according to 

(iii) Given (S^, ^), determine S~ by applying Algorithm 13.11 with function (j){S) de- 
fined by (pgiS-, S+, T) in provided that the ratio x(Xi,n(S+), >li,iv-„(S^))/x(7Wi,n(S+)) 
is a symmetric function of A4.i^i\f —n (S ). Obtain KAr„„_i(S [S+, 0) according 

to (UHl)- 

After a total of — 1 sub-steps, we obtain a random sample of (S^,S^,^) 
distributed as the trial distribution KAr_„_i(S^ [S^, ^) xk„_i(S+[0) x p{9). Ii-K{d9) = 
TT{9)d9, its importance sampling weight is given by 
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We remark that it is possible, indeed more desired in terms of efficiency of the SIS 
method, that the sequence in sampling the two paths in steps (ii) and (iii) can be 
randomized based on appropriate, but slight, modifications of the function 0(S) in 
applying Algorithm 13.11 That is, there is one-half probabihty that S"*" is sampled 
before S~ as stated in Algorithm 03J otherwise, S~ is sampled before S^. 

Corollary 3.5. Posterior quantities for models in (^, like (|22|1 and ()24() . which are 
expressible as 



Ih 

can be approximated by 



[ ^ J]]/i(S-,S+,0)^(S-,S+,d0|T) 
■^'^ s- s+ 



M _ E»=l ^(S(^) . S^) . ^(i) ) WN-l{S^.^ , S+ , g(^) ) 

E»=i^iv-i(S(^),s+ ,%)) 

where (S(^)i '^(i)' • • • i (^(m)' ^(a/)' ^(a/)) is a sequence of M i.i.d. samples from 

with respective importance sampling weights WN-i{S'^iy S^^, ^(i)), • • • , WN-i{S^j^jy S^^, ^(jv/))) 

obtained by carrying out Algorithm 13.41 independently for a large number of times 

M. 



4 Numerical Results 

This section concerns practical applications of our methodology. For purpose of 
illustration, G is selected to be the two-parameter Poisson-Dirichlet process as the 
corresponding results are discussed in Section 12.11 The idea of conjugacy suggests 
H(-) of the measure WIH; a, b) to be related to a Pareto distribution. In particular, 
we chose the following mixture of two Pareto random variables, symmetrical about 
zero, that is, 

H{dX) = jpj^ KX < -6) dX + liX > 6) dX, a,5>0, (31) 

such that it results in 

/ X-''H{dX) = / i-XyHidX) = , ^ . 

Jy J-oo 2(a + z.)max(|yU)"+- 
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for y > and any positive integer ly, which are necessary in implementation of 
Algorithm \'AA\ (or Algorithm IH.1|) . For purpose of "deflating" the prior belief, we 
choose a = 6 = 0.000001. Due to the same reason, the prior Tr{d9) is chosen to be 
uniformly distributed on a reasonably large interval on TZ such that all observations 
are included. The sequence in which the coordinates of the S-paths are determined, 
say, {h, . . . , In~i} for a path of n + 1 coordinates, is randomized in every application 
of the sequential algorithms. Likewise, the determinations of the two paths are also 
randomized in Algorithm 13.41 Last but not least, the Monte Carlo size M = 1000. 



4.1 Resolution of the SIP sampler 

This section addresses the performance of the SIP sampler which directly affects 
the SIS method (Algorithm 13. 4() for estimating a unimodal density. Based on a fixed 
and known mode our interest is to estimate the unimodal density with G 
taken to be the two-parameter Poisson-Dirichlet process with H in ()31|) . a = and 
6 = 1, given as in ^ with a/(t|S-, S+, 6*, T) defined by 6* = 6*0 together with the 
simplifications discussed in Section r2.1l To approximate the posterior mean, steps (ii) 
and (iii) in Algorithm 13. 4f , which are essentially two sequential applications of the 
SIP sampler, are implemented based on the known mode ^o- In particular, the 
convergence property of the approximated density estimate as the sample size 
increases is studied. 

Based on nested samples of sizes N = 500, 1000 and 3000 from a unimodal 



^As discussed after the introduction of Algorithm 13.41 the sequence of determinations of the two 
paths - first or S~ first - is randomized to achieve a higher efhciency. 
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density with [—1,0] as modal interval (Wegman 1970a) given by 



0.02 



-7 <t<-2 



0.1 



-2 < t < -1 



Ai(i) 



= < 



0.4 



-1< t < 



0.4exp(-t) 



t > 







otherwise, 



weighted averages that approximate the posterior mean of the unimodal density 
conditioning on 9 = Oq, given as in (|30|) . 



where t(;Ar_2(S^^, S^^|^o) is the importance sampling weight of the pair (S^^S^^ 
resulted from steps (ii) and (iii) of Algorithm 13.41 are displayed at the left columns 
in Figures foi" = —1,-0.5 (center mode), and 0, respectively. The whole 
procedure is repeated for the two-parameter Poisson-Dirichlet process with a = 0.9 
and b = 100. The density estimates based on the three selected values of the mode 
are depicted at the right columns in Figures The graphs echo the fact that the 
approximated Bayes estimate of the unimodal density, 7a^(t|6'o)i tends to the "true" 
unimodal density Ai(t) as sample size increases (from top to bottom in the figures) 
regardless of the two sets of parameters for G (between columns in the figures). 
When N is large, there is not much difference among density estimates based on 
different modes. 

4.2 Resolution of the SIS method (Algorithm 13.41) 

The practicality of the SIS method ( Algorithm ic. 4|) for estimation of a unimodal den- 
sity and its mode is addressed in this section. To estimate the unimodal density 
with G taken to the two-parameter Poisson-Dirichlet process with a = and b = 1, 
Algorithm 13.41 based on p{9) as a standard normal density is implemented indepen- 
dently for M = 1000 number of times to produce random samples of (S^,S^,^) 



E ■=! «/(^|S(;) , s+ , gp, T) WN-2{s^) , s+ \eo) 
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with importance sampling weight w^-iiS , S+, 9). According to Corollary 13.51 two 
weighted averages, defined as in (|nn|) . 



and 



E*=i «/ (* I , S+ , 0(^i) , T) WN-i{S,.. , S+ , ) 



Ei=i w^A'-i(S(,),S(.),%)) 

are used to approximate the Bayes estimates (the posterior mean given observa- 
tions) of the unknown mode 9 and the unknown unimodal density, respectively. 

The unimodal density Ai(t) in the previous section and another two unimodal 
densities are chosen as test densities. They are. 



hit) 



0.02 


-7 <t<-2 


0.25 


-2<t <0 


0.5 


< i < 0.1 


0.1 


0.1<t< 2.5 





otherwise. 



and 
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C (1.52;) I(-oo < X < 0) + C ( ) 1(0 < X < oo) 



where C(') is the density function of a standard Cauchy random variable. These 
three densities behave quite differently from one another in the sense that they have 
modal interval of length 1, modal interval of shorter length 0.1, and a unique mode 
at zero, respectively. 

Density estimates 7^(t) based on nested samples of sizes = 500, 1000 and 2000 
from the three unimodal densities are depicted in the left columns of Figures 14161 
respectively, while mode estimates 9^^ are presented in Table|2l The whole procedure 
is repeated with p{9) as a less diffuse normal density with mean and standard 
deviation 1/4. The resulting density estimates are depicted in the right columns 
of Figures EBHl while mode estimates are appended in Table l2j It is evident from 
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the mode estimates in Table |21 especially when N is not large, that approximation 
results based on p(6) with a smaller standard deviation are much better than the 
others. This is also supported by Figures I4I6I for instance, the peak at the modal 
interval [0,0.1] is not well-captured even when N = 2000 (graph at the bottom-left 
in Figure ISJ. This phenomenon can be addressed by the well-known fact (Kong, Liu 
and Wong (1994) and Liu, Chen and Wong (1998)) that efficiency of any SIS method 
depends heavily on whether the initial trial distributions in its early steps/stages is 
close to the true conditional distributions. Hence, a good choice of p{6) in step (i) 
of Algorithm 18.41 directlv affects the efficiency of the SIS method. 



Table 2: Weighted average estimates of the mode 
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N 


Ai(t) 




hit) 




500 


-1.249450 


-1.538695 


-0.230042 


N{0,1) 


1000 


1.068161 


0.079308 


0.815681 




2000 


-0.999335 


0.037052 


-0.615350 




500 


0.165998 


0.138150 


-0.071292 


Af(0,0.252) 


1000 


0.199645 


0.143027 


0.013269 




2000 


0.101668 


-0.071598 


-0.271294 


True mo 


ide 


[-1,0] 


[0,0.1] 






To explore the selection issue of p{9), we carry out a large-sample study by 
replicating the above procedure to estimate the mode of the unimodal density X^it) 
based on = 500 observations. Histograms of the 2000 independent Bayes estimates 
of 9 based on the two different p(^)'s are plotted in Figure[7| It is clear from the graph 
in the last row based on a standard normal density for p{9) does not give convincing 
posterior estimates of the mode. On the contrary, the graph in the second row shows 
that the true mode is well-captured when p{9) is less diffuse. This deficiency can 
be understood by looking at the histogram of the 500 observations in the first row 
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in Figure [7| the posterior distribution of the mode 6 should be quite concentrated 
around zero and, hence, the choice of a standard normal density for p{9) may be far 
too diffuse. Note that regarding the results for the first two test densities, the less 
diffuse normal density with standard deviation 1/4, symmetrical about zero, is not 
really close to the posterior distribution of 9 at all based on the histograms of the 
data in Figure |H1 This implies that it is not necessary to set p{9) to be extremely 
close to the true posterior distribution of 9, which is characterized in Theorem 12.51 
In conclusion, we suggest setting p{9) to be a density which is not too diffuse around 
the mode (based on information from the histogram of the data) in applying the 
SIS method (Algorithm 13. 4|) for estimating unimodal densities. 

Appendix: Proof of Theorem 12.11 and 12.21 

Proof. Suppose 9 is given. Theorem 2 in Ishwaran and James (2003) states that the 
law of G in given 9 and N i.i.d. observations T is characterized by 



for any nonnegative or integrable function g, wherein V{dG\'K* ,p,9,T) is deter- 
mined by Lemma 1 in Ishwaran and James (2003), and Hfcl^T^ fJ'idX^\Ck)Tr{p\9,T) 
is equivalent in distribution to the posterior distribution of X given 9 as discussed 
in Theorem 1 in Ishwaran and James (2003), where, for k = 1, . . . , N{p), fj,{dX^\Ck) 
is proportional to 





and n{p\9, T) = 7r(p) Uk^V I ^dX*,)/ Ep vr(p) UklV I ^dX*,). 
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Splitting p into p"*" and p as discussed in Section|21and re-expressing the kernels 
according to yield 

^tidX;) = 1(0 < maxy, < X*,) H[dXl) k < N{p^ 



fkidX;. 



{Xl)-^ 1(0 < maxy, < XI) H{dX*,) 
{-Xl)--'^ 1{XI < min Z, < 0) HidX*,] 



k > N{p+) 



(34) 



and 



7r(p|0,T) 



7r(p) 


USt^I^tidX*)] 








Ep^(p) 


nSt^-^tidx:) 

p-|p+,e,T) x^+(] 


3^ 


-|^,T) 



Ep+ Ep- ^-(P-|P+,^,T 



where ip~^{p'^\9,T) = vr(p+)J3^^^ ^ J (ff{dX*) defines a posterior distribution of 
p+ of the n positive observations Y given 9 and tp^ (p |p^, ^, T) = 7r(p^ |p+) njI!Ar(p+)+i / ^j(dX*] 
defines a (conditional) posterior distribution of p^ of the remaining N — n nega- 
tive observations given (p^, 6). The equality in ()35() follows from @ and re-writing 
= X^p+ Z^p- • Then, combining (|^ and gives 



?,T) 



(35) 



iV(p) 

J] A^(dX,*|Cfc)7r(p|^,T) 

k=l 



(X 



N(p) 

n v'T(dx;)v-(p-|p+,^,T) 

j=N {p+)+l 

'7V(p+) 

X 



n v'^+(dx*)V'+(p+ie,T) 



i=l 



(36) 



Theorem 2.1 in Ho (2006b) yields that the law of Xf, . . . ,X 



7V(p+) 



p+|6',T (pro- 



portional to the last term above) is equivalent to the law of U, S^|^,T defined 
by 1)12(1 and ()14|) . Utilizing the symmetric properties of vr(p"|p^) in Q and (|11() 
and applying Theorem 2.1 in Ho (2006b) yield the equivalence between the law of 
-'^Ar(p+)+i' • • • ' "'^iv(p)' P IP^' ^' Pi'oportional to the first term at the right hand 
side of and the law of V,S^|S^,^,T defined by ((T^ and (fT3)) . completing 

the proof of Theorem 12.11 The result in Theorem 12.21 follows as a result of Theo- 
rem by recognizing the equality in distribution between 'P((iG|X*, p, ^, T) and 
V{dG\Y,U,S-,S+,e,T) in (El. □ 
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Figure 1: The true unimodal density Ai(t) (solid line) and weighted av- 
erage density estimates given the mode = —1 produced by the SIP sam- 
pler (Steps (ii) and (iii) of Algorithm 13. 4|) based on a = and 6 = 1 (left 
column) and a = 0.9 and b = 100 (right column) for G. 
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Figure 2: The true unimodal density Ai(t) (solid line) and weighted av- 
erage density estimates given the mode 6 = —0.5 produced by the SIP 
sampler (Steps (ii) and (iii) of Algorithm 13. 4|) based on a = and b = 1 (left 
column) and a = 0.9 and b = 100 (right column) for G. 
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Figure 3: The true unimodal density Ai(t) (solid line) and weighted av- 
erage density estimates given the mode 6 = produced by the SIP sam- 
pler (Steps (ii) and (iii) of Algorithm 13. 4|) based on a = and 5 = 1 (left 
column) and a = 0.9 and b = 100 (right column) for G. 
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Figure 4: The true unimodal densities Ai(t) (solid lines) and weighted 
average density estimates produced by Algorithm 13.41 based on a A^(0, 1) 
density (left column) and a A^(0, 0.25^) density (right column) for p{0). 




Figure 5: The true unimodal densities A2(t) (solid lines) and weighted 
average density estimates produced by Algorithm 13.41 based on a A^(0, 1) 
density (left column) and a A^(0, 0.25^) density (right column) for p{0). 



Unimodal Density 



35 





Figure 6: The true unimodal densities A3(t) (solid lines) and weighted 
average density estimates produced by Algorithm 13.41 based on a A^(0, 1) 
density (left column) and a A^(0, 0.25^) density (right column) for p{0). 
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Figure 7: Histogram of 500 observations simulated from A3(t) and his- 
tograms of the resulting Bayes estimates of 6 by 2000 replications of Algo- 
rithm based on a iV(0, 1) density and a iV(0, 0.25^) density for p{9) (from 
top to bottom). 
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Figure 8: Histograms of the data simulated from unimodal densities 
Xi{t) (left column), X2(t) (middle column), and A3(t) (right column). 



